This report explores the Prosper Loan Dataset. Prosper is an online marketplace lending(also called peer-to-peer lending) that connects borrowers and investors. Investors can explore credit-worthy borrowers by FICO score, ratings, and terms and then invest in unsecured loans( or a part of the loan) and earn returns.
I am a financial market enthusiast and follow it daily basis. I believe the financial markets are the reflection of human psychology. The year 2008 financial crisis was due to subprime mortgage loans, derivatives, too much leverage, and housing bubble that caused a global crisis and lead to the demise of financial firms, required trillion-dollar taxpayer bailouts, and caused a recession that matched the Great Depression in its magnitude.
Prosper loan is a peer-to-peer lending/loan marketplace in the United States. Prosper loan dataset contains records from year 2006 to 2014. I am trying to understand why debt delinquency happens and what can be done to reduce it. Is borrower financial situation(features like AmountDelinquent, LoanCurrentDaysDelinquent, Occupation, EmploymentStatus, CreditScoreRangeLower, CreditScoreRangeUpper etc. ) or lenders business model (features like ProsperRating..Alpha. , IncomeVerifiable, BorrowerAPR etc ) has role to play.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
From the summary details, the average loan is $6,500. Majority of the loans are less than 10,000 as shown in the positively skewed plot thus mean($8,337) more than the median($6,500).
Next obvious question to ask what kind of services borrowers are buying from loan money.
Most of the borrowers are using the loan for debt consolidation which could be credit card debt and home improvement.
Next looking into employment status of borrower.
Most of loan takers are “Employed”, “Full-time” and “self-employed” - which suggest that there is high probabity that loan will be paid in full.
Next looking into Income Range of borrower.
Most of the borrower’s income is in the range of $ 25K - $ 75K.
Next looking into annual percentage rate, borrower’s APR(interest rate plus other costs ) for the loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229 25
Borrower’s APR has a mean of 0.2188 or 21.88% more than the median (0.20976 or 20.97%). We also see loan APR more 30%, which is pretty high and tip borrower to delinquency.
Next we look into Loan status .
Frequencies of the loan that are “completed” and “current” are more. It would be intersting to see “Chargedoff” and “Defaulted” loan more closely in terms of dollar(s) amount and reasons behind it - may be features like “Employmentstatus”, “DebtToIncomeRatio”,“LoanCurrentDaysDelinquent” may give us some clue.
Next, it would be interesting to look at the borrower’s debt to income ratio. Higher debt to income ratio is a recipe for delinquency.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
Average borrower has a debt to income ratio of 0.27 or 27% less than the typical 33%.
Next, looking into Credit score.
As credit score appear to be normally distributed - Proposer’s overall risk is reasonable.
Plotting feature “LoanOriginationQuarter” could be impressive as US Housing buble burst resulting credit crisis and subsequent 2007-2009 recession. The market bottomed by June 2009.
Above frequency plot does show a sharp dip in loan origination starting “Q4 2008” and then slowly gaining momentum from “Q3 2009” onward.
Plotting Prosper ratings. Prosper Rating is a proprietary rating system that allows potential investors evaluation of loan applicants.
A large count of loan does not have Prosper Rating assigned to them. This may be because the proprietary rating system developed after the 2008-2009 credit crisis.
Let’s see the distribution of amount delinquent at the time credit profile pulled.
Above plot show that most of the borrowers have less than $1000 delinquent at the time credit was pulled.
Looking into the summary of some the variables like “LoanOriginalAmount”,“MonthlyLoanPayment”,
## LoanOriginalAmount MonthlyLoanPayment BorrowerAPR
## Min. : 1000 Min. : 0.0 Min. :0.00653
## 1st Qu.: 4000 1st Qu.: 131.6 1st Qu.:0.15629
## Median : 6500 Median : 217.7 Median :0.20976
## Mean : 8337 Mean : 272.5 Mean :0.21883
## 3rd Qu.:12000 3rd Qu.: 371.6 3rd Qu.:0.28381
## Max. :35000 Max. :2251.5 Max. :0.51229
## NA's :25
## StatedMonthlyIncome MonthlyLoanPayment.1 Investors
## Min. : 0 Min. : 0.0 Min. : 1.00
## 1st Qu.: 3200 1st Qu.: 131.6 1st Qu.: 2.00
## Median : 4667 Median : 217.7 Median : 44.00
## Mean : 5608 Mean : 272.5 Mean : 80.48
## 3rd Qu.: 6825 3rd Qu.: 371.6 3rd Qu.: 115.00
## Max. :1750003 Max. :2251.5 Max. :1189.00
##
The structure of Prosper Dataset has numeric and factor variables. Prosper Dataset has total 113937 observations of 81 variables.
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ CreditGrade : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
## $ DateCreditPulled : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
I intend to understand the reasons behind borrower delinquency and Prosper/ Investor response to credit lending practices.
My focus is on the following features - “LoanOriginalAmount”,“ListingCategory..name”,“EmploymentStatus”, “IncomeRange”,“BorrowerAPR”,“LoanStatus”,“DebtToIncomeRatio”,“CreditScoreRangeLower”,“CreditScoreRangeUpper”,“LoanOriginationQuarter”, and “ProsperRating..Alpha.”
I did create two variables namely “ListingCategory_name” and “level_order”
I did re-order Loan Origination Quarter to plot histogram.
Note: https://www.prosper.com/plp/general-prosper_score/ APRs through Prosper range from 6.95% (AA) to 35.99% (HR) for first-time borrowers, with the lowest rates for the most creditworthy borrowers.
It looks like majority of loans given upto 2008-2009 does not show proper rating bute after “Q2 2009” most of the loans are of “AA”, “A” or “B” i.e. loan with low percentage.
After loan credit crises due to the housing bubble, minimum credit score was raised.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
After loan credit crises due to the housing bubble, there is not only increase in minimum credit score but the implementation of the Prosper Rating system to help investors buy the loan with risk awareness.
Let’s also look into Prosper rating versus BorrowerAPR.
The plot shows that as rating go lower borrower’s APR increases. That’s normal. Outliers are removed.
summary(df$Investors)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
We see a trend of investors investing in higher loan dollars per Prosper rating system. Most of the investors are investing in Prosper rating “B” and “C” category - which appears driver purely by coupon yield.
Lets group borrower by income range and look for loan amount mean.
Higher the income range - higher the loan amount provided to the borrower. At the same time, we have to be careful about borrower’s debt to income ratio (below 33% is considered safe). We are plotting next
Lets look into level of order IncomeRange Versus Debt to income ratio
So the plot shows that lower income range has higher “DebtToIncomeRatio” compare to the higher income group thus lower-income group more risk to delinquency.
Comparing Prosper rating Vs. Credit score, original loan amount, and amount delinquent.
Most of the delinquent dollars are less than $1000 of the borrowed original $10000(mean) is mainly concentrated towards “HR”" Prosper rating and not so bad lower credit range.
Here is some correlation matrix of selected features
There is a strong correlation between “LoanOriginalAmount” and “MonthlyLoanPayment.”
##
## Pearson's product-moment correlation
##
## data: df$LoanOriginalAmount and df$MonthlyLoanPayment
## t = 867.82, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9312165 0.9327426
## sample estimates:
## cor
## 0.9319837
As the loan amount increases so is the monthly payment - that’s obvious.Corelation is quite strong.
## # A tibble: 2,468 x 5
## LoanOriginalAmo~ mean_AmountDeli~ mean_BorrowerAPR mean_CreditScor~
## <int> <dbl> <dbl> <dbl>
## 1 1000 NA NA NA
## 2 1001 NA 0.206 NA
## 3 1005 NA 0.340 530
## 4 1010 6683 0.0898 540
## 5 1025 NA 0.204 NA
## 6 1030 NA 0.271 600
## 7 1031 9720 0.236 690
## 8 1032 3207 0.263 560
## 9 1035 NA 0.188 500
## 10 1036 NA 0.0866 820
## # ... with 2,458 more rows, and 1 more variable: n <int>
##
## Pearson's product-moment correlation
##
## data: df.df_by_LoanOriginalAmount$LoanOriginalAmount and df.df_by_LoanOriginalAmount$mean_BorrowerAPR
## t = -18.579, df = 2451, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3855602 -0.3161646
## sample estimates:
## cor
## -0.3513449
##
## Pearson's product-moment correlation
##
## data: df.df_by_LoanOriginalAmount$LoanOriginalAmount and df.df_by_LoanOriginalAmount$mean_CreditScoreRangeLower
## t = 23.743, df = 2358, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4061005 0.4712489
## sample estimates:
## cor
## 0.439252
The above plot shows that there is a negative correlation(-0.3513449) between “Groupby(Loan Original Amount)” and “Mean Borrower APR.” As “Groupby(Loan Original Amount)” increases there is a decrease in “Mean Borrower APR.”
The above plot shows that there is a positive correlation(0.439252) between “Groupby(Loan Original Amount)” and “Mean Credit Score Range Lower.” As “Groupby(Loan Original Amount)” increases there is an increase in “Mean Credit Score Range Lower.”
After loan credit crises due to the housing bubble, there is not only increase in minimum credit score but the implementation of the Prosper Rating system to help investors buy the loan with risk awareness.
Better quality of loan after “Q2 2009”.
Investors investing in higher loan dollars for “AA”, “A” ratings.
Higher the income range higher loan amount. But we have to watch out for borrower’s debt to income ratio (preferably below 33%)
Lower income range borrower has higher “DebtToIncomeRatio” compare to the higher income group thus lower-income group more risk to delinquency. Prosper ratings are inversely proportional to borrower’s APR.
There is a negative correlation(-0.3513449) between “Groupby(Loan Original Amount)” and “Mean Borrower APR.” As “Groupby(Loan Original Amount)” increases there is a decrease in “Mean Borrower APR.”
There is a positive correlation(0.439252) between “Groupby(Loan Original Amount)” and “Mean Credit Score Range Lower.” As “Groupby(Loan Original Amount)” increases there is an increase in “Mean Credit Score Range Lower.”
Most of the delinquent dollars are less than $1000 of the borrowed original $10000(mean) is mainly concentrated towards poor Prosper rating and not so bad lower credit range or upper credit range.
There is a strong relationship between Prosper rating and Borrower’s APR.
Most of the loan issued fall under 36-months Term. There are excess of loans with Prosper rating of “HR”, “E” and “D” and increasing APR. The quantity of loans debt issued for 12 months is quite less in comparison to 36 and 60 months. There appears to be a fair distribution of borrowers " Amount Delinquent " at the time credit was pulled, and it may be because most of the borrowers are trying to consolidate debt.
Most of the delinquent dollars between 200-700 have more than 200 days delinquent. Most of the delinquent data points are from Prosper rating “D”, “E”, and “HR” - which seems logical. We do see some Prosper rated loan “A” in delinquency - which raises doubt over faithfulness of the Prosper rating system.
Now looking into the relation between feature “Investors,” “LoanOriginalAmount,” and Prosper rating.Note: Entries where Prosper rating not listed are filtered.
##
## Pearson's product-moment correlation
##
## data: my_df_01$Investors and my_df_01$LoanOriginalAmount
## t = 98.125, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3131790 0.3252646
## sample estimates:
## cor
## 0.3192348
The above plot shows that there is a positive correlation(0.3192348) between “Investors,” and “LoanOriginalAmount.”
Additionally, we see that there is a gradual increase in “Investors” count for “LoanOriginalAmount” for higher rated “AA” and “A” borrowers. Poor rated have a lower count of “Investors”. Now, this makes sense and Prosper rating were implemented after the Real Estate bubble burst - “Q3 2009”.
The above plot shows that most of the loans were for 36 months and the estimated effective yield is higher as the Prosper rating decreases i.e., “C”, “D”, “E”, and “HR”. It appears that the lure of higher yield motivated the investors to invest in loans of the poor rating. This is more the reflection of time before the credit crisis where robot signing was the norm leading to world-wide recession.
We see trend of better quality,Prosper rated “AA”, loan after from year 2009. We do see some poor quality rated loan like “D” and “E” - which indicates investors affinity for higher yield. It would be interesting to see Prosper Dataset from the year 2014 onwards to know Investor sentiment towards yield when the federal interest rate is on the rise
Let’s look into the feature “Income Verifiable versus”Loan Original Amount."
It seems strange to know how Prosper rated borrowers if borrowers’ income were not verified. Maybe credit score was used. The plot shows that the number of loans given to borrowers where income was verified is more in comparison to not. Also, there is a gradual increase in the size of the loan amount as income is checked and after “Q2 2008” as Proper rating were implemented.
lets create liner model m1 with response variable: BorrowerAPR and predictor variable: LoanOriginalAmount
Additionally updating: model m1 by m2 by adding variable “ProsperRating..Alpha.” , model m2 by m3 by adding variable “DebtToIncomeRatio”, model m3 by m4 by adding variable “Term” and in the linear regression
m1 = lm(BorrowerAPR ~ LoanOriginalAmount, data = df)
m2 = update(m1, ~. + ProsperRating..Alpha.)
m3 = update(m2, ~. + DebtToIncomeRatio)
m4 = update(m3, ~. + factor(Term))
mtable("Model 1" = m1,
"Model 2" = m2,
"Model 3" = m3,
"Model 4" = m4)
##
## Calls:
## Model 1: lm(formula = BorrowerAPR ~ LoanOriginalAmount, data = df)
## Model 2: lm(formula = BorrowerAPR ~ LoanOriginalAmount + ProsperRating..Alpha.,
## data = df)
## Model 3: lm(formula = BorrowerAPR ~ LoanOriginalAmount + ProsperRating..Alpha. +
## DebtToIncomeRatio, data = df)
## Model 4: lm(formula = BorrowerAPR ~ LoanOriginalAmount + ProsperRating..Alpha. +
## DebtToIncomeRatio + factor(Term), data = df)
##
## =============================================================================================
## Model 1 Model 2 Model 3 Model 4
## ---------------------------------------------------------------------------------------------
## (Intercept) 0.253*** 0.201*** 0.200*** 0.196***
## (0.000) (0.000) (0.000) (0.001)
## LoanOriginalAmount -0.000*** -0.000*** -0.000*** -0.000***
## (0.000) (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: A -0.053*** -0.052*** -0.053***
## (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: AA -0.102*** -0.101*** -0.101***
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: B -0.008*** -0.007*** -0.008***
## (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: C 0.033*** 0.034*** 0.033***
## (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: D 0.085*** 0.086*** 0.084***
## (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: E 0.133*** 0.133*** 0.132***
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: HR 0.158*** 0.159*** 0.158***
## (0.001) (0.001) (0.001)
## DebtToIncomeRatio 0.002*** 0.002***
## (0.000) (0.000)
## factor(Term): 36/12 0.004***
## (0.001)
## factor(Term): 60/12 0.009***
## (0.001)
## ---------------------------------------------------------------------------------------------
## R-squared 0.104 0.716 0.707 0.707
## adj. R-squared 0.104 0.716 0.707 0.707
## sigma 0.076 0.043 0.043 0.043
## F 13258.004 35811.700 28189.519 23117.848
## p 0.000 0.000 0.000 0.000
## Log-likelihood 131830.816 197159.800 181848.123 181935.419
## Deviance 658.982 209.283 195.437 195.113
## AIC -263655.632 -394299.600 -363674.247 -363844.838
## BIC -263626.702 -394203.168 -363569.030 -363720.492
## N 113912 113912 105358 105358
## =============================================================================================
From the model out we can interpret that with R-squared and adjusted R-squared equal 0.707 i.e 70.7 % and p-value = 0 it is a nice model.
Also we can see that slope turn from negative to positive as Proper rating goes shift from A towards HR, prediting higher Borrower APR.
As “Groupby(Loan Original Amount)” increases there is a decrease in “Mean Borrower APR.”
As “Groupby(Loan Original Amount)” increases there is an increase in “Mean Credit Score Range Lower.”
Though small, it is visible that loan given with poor Prosper rating, i.e., “C”, “D”, “E”, and “HR” tend to have delinquencies(though small amount) as borrower APR increases.
It appears that the lure of higher yield motivated the investors to invest in loans of the poor rating. This is more to the reflection of time before the credit crisis where robot signing was the norm - leading to recession.
I build a model with - response variable: BorrowerAPR and predictor variable: LoanOriginalAmount
Additionally updating: model m1 by m2 by adding variable “ProsperRating..Alpha.” , model m2 by m3 by adding variable “DebtToIncomeRatio”, model m3 by m4 by adding the variable “Term” and in the linear regression Though I got relatively high R-squared and adjusted R-squared value( equal to 0.707) - it does not necessarily indicate model is a good fit. We need to evaluate other model statistics to get a comprehensive insight.
I have chosen this plot because it shows a vivid dip starting quater “Q2 2008”. Well, Bear Sterns collapsed in March 2008 and set the stage for worst debt crisis since the Great Depression. The Prosper dataset has reflections of those times - the crash of the real estate bubble and the collapse of commercial credit from the last recession.
Additinally,(https://www.sec.gov/litigation/admin/2008/33-8984.pdfhttps://www.sec.gov/litigation/admin/2008/33-8984.pdf) on November 24, 2008, the SEC found Prosper to be in violation of the Securities Act of 1933. As a result of these findings, the SEC imposed a cease and desist order on Prosper … In July 2009, Prosper reopened their website for lending (“investing”) and borrowing after having obtained SEC registration for its loans (“notes”). After the relaunch, bidding on loans was restricted to residents of 28 U.S. states and the District of Columbia. Borrowers may reside in any of 47 states, with residents of three states (Iowa, Maine, and North Dakota) not permitted to borrow through Prosper.
Resulting, government introduced regulations (https://en.wikipedia.org/wiki/Regulatory_responses_to_the_subprime_crisis) to better manage risk. Prosper introduced the rating system to help the investor invest prudently. Thus the plot show after “Q3 2009” a gradual increase in loan origination with better risk management in place. As time passes humans forget - regulations are rolled back(https://www.nytimes.com/2018/01/01/us/politics/trump-businesses-regulation-economic-growth.html) - and the stage is set for next recession.
## # A tibble: 2,468 x 6
## LoanOriginalAmo~ mean_AmountDeli~ mean_BorrowerAPR mean_CreditScor~
## <int> <dbl> <dbl> <dbl>
## 1 1000 NA NA NA
## 2 1001 NA 0.206 NA
## 3 1005 NA 0.340 530
## 4 1010 6683 0.0898 540
## 5 1025 NA 0.204 NA
## 6 1030 NA 0.271 600
## 7 1031 9720 0.236 690
## 8 1032 3207 0.263 560
## 9 1035 NA 0.188 500
## 10 1036 NA 0.0866 820
## # ... with 2,458 more rows, and 2 more variables:
## # mean_DebtToIncomeRatio <dbl>, n <int>
##
## Pearson's product-moment correlation
##
## data: df.df_by_LoanOriginalAmount$LoanOriginalAmount and df.df_by_LoanOriginalAmount$mean_BorrowerAPR
## t = -18.579, df = 2451, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3855602 -0.3161646
## sample estimates:
## cor
## -0.3513449
##
## Pearson's product-moment correlation
##
## data: df.df_by_LoanOriginalAmount$LoanOriginalAmount and df.df_by_LoanOriginalAmount$mean_CreditScoreRangeLower
## t = 23.743, df = 2358, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4061005 0.4712489
## sample estimates:
## cor
## 0.439252
##
## Pearson's product-moment correlation
##
## data: df.df_by_LoanOriginalAmount$LoanOriginalAmount and df.df_by_LoanOriginalAmount$mean_DebtToIncomeRatio
## t = -0.60306, df = 2094, p-value = 0.5465
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05596120 0.02965434
## sample estimates:
## cor
## -0.01317758
The above plot shows that there is a negative correlation(-0.3513449) between “Groupby(Loan Original Amount)” and “Mean Borrower APR.” As “Groupby(Loan Original Amount)” increases there is a decrease in “Mean Borrower APR.”
There is a positive correlation(0.439252) between “Groupby(Loan Original Amount)” and “Mean Credit Score Range Lower.” As “Groupby(Loan Original Amount)” increases there is an increase in “Mean Credit Score Range Lower.”
There is a negative correlation(-0.01317758 ) between “Groupby(Loan Original Amount)” and “Mean mean_DebtToIncomeRatio.” - which is good trend. Borrowers with the higher range of lower credit score given lower ARP for increasing loan amount. This makes sense for Borrowers having less than 33%( as a general rule ). As the debt-to-income ratio increases, borrowers ability to pay monthly debt payments become difficult and may lead to delinquency. (Ref:https://www.consumerfinance.gov/ask-cfpb/what-is-a-debt-to-income-ratio-why-is-the-43-debt-to-income-ratio-important-en-1791/)
Most of the delinquent dollars from $200-$700 have more than 200 days delinquencies. Most of the delinquent data points are from Prosper rating “D”, “E”, and “HR” - which seems logical. We do see some Prosper rated loan “A” in delinquency - which raises doubt over faithfulness of the Prosper rating system.
Not sure why 36-months Term is more popular than 12 months or 60 months - Maybe it is a sweet spot where ARP and monthly payment are just appropriate.
My goal was to understand why debt delinquency happens and what can be done to reduce it. Prosper dataset has 113937 observations of 81 variables. I struggled in choosing the right features from this huge dataset, and it took a lot of time.
At times I used conditional means(library - dpyr ) for analysis and plotting.
I would prefer to analyze Proper dataset up to the year - 2018 to better understand how their rating system is helping investors to make the prudent debt investment. Prosper loans are unsecured - a high APR range ( 6.95% - 35.99% ) and poor Maximum debt-to-income ratio: 50% (excluding mortgage) is quite risky for the investor as well as borrowers(Ref: https://www.nerdwallet.com/blog/loans/prosper-personal-loans-review/)
As of August-2018, With low unemployment, 3.9% (ref:https://tradingeconomics.com/united-states/unemployment-rate) and decresing government regulations it will interesting see how Prosper has morphed its business model - latest dataset analysis can provide interesting insights.